[Kafka #20707] KAFKA-19648; Cluster metadata bootstrapping with kraft checkpoint#30
Conversation
…ache#21957) Ensures that all Metered Session-stores (plain and headers) pass headers into de/serializers. Reviewers: Uladzislau Blok <blokv75@gmail.com>, TengYao Chi <frankvicky@apache.org>
…itions being revoked (apache#21897) This addresses race conditions where the app thread could collect/return records for revoked partitions. Fix by ensuring that the app thread does not return buffered records if it hasn't checked pending reconciliations. Once it checked pending reconciliations, we know that partitions being revoked were marked as non-fetchable (so it's when we can safely move onto fetching/collecting in the app thread). Also ensure that background reconciliations do not trigger revocations (the app thread could already have records in memory, collected from the buffer, for those partitions, which would lead to the consumer returning records for revoked partitions if the background completes the revocation before the app thread returns). With these fixes we are sure that the app thread only collects/returns records after it has marked revoked partitions as non-fetchable. This fix applies to the consumer only (share consumer remains unchanged with this PR, can trigger full reconciliation & assignment update from the background) Reviewers: Andrew Schofield <aschofield@confluent.io>, nileshkumar3 <nileshkumar3@gmail.com>, PoAn Yang <payang@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>, Kirk True <ktrue@confluent.io>
…-commit (apache#21424) Document that unsubscribe() doesn't commit offsets even with auto-commit enabled, and add test to verify this behavior Reviewers: Lianet Magrans <lmagrans@confluent.io>, David Jacot <djacot@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
Summary: fix the typo `addResponse` to `removeResponse` to align the meaning of the code. Reviewers: Ken Huang <s7133700@gmail.com>, Jiayao Sun <jiayao.s@outlook.com>, Chia-Ping Tsai <chia7712@gmail.com>
Fix log statement to use the local variable `defaultApiTimeoutMs` instead of the uninitialized field `this.defaultApiTimeoutMs`. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
…apache#21941) For headers-store with caching enabled, we want to keep IQv2 disabled until we implement IQv2 across the board. Thus, we need to introduce CachingKeyValueStoreWithHeaders and just forward any query to the underlying store, instead of re-using the existing query implementation on CachingKeyValueStore. Reviewers: Matthias J. Sax <matthias@confluent.io> --------- Co-authored-by: Matthias J. Sax <mjsax@apache.org>
…pache#21977) Fix repeated word `not` in ConnectException messages. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
We previously only added support to query headers stores as plain- or ts-kv-stores. This PR exposes headers-store with their native type. Reviewers: Bill Bejeck <bbejeck@apache.org>
…1997) Minor follow-up to ensure we handle wake-up when checking reconciliation future in poll. Reviewers: Andrew Schofield <aschofield@confluent.io>, TengYao Chi <frankvicky@apache.org>
Adds missing method TopologyTestDriver.getSessionStoreWithHeader(), and updates TopologyTestDriverTest for all newly added header store types. Reviewers: TengYao Chi <frankvicky@apache.org>, Alieh Saeedi <asaeedi@confluent.io>
* DLQ support for share groups KIP-1191 will be gated by a `share.version` upgrade - slated for 4.4 * In this PR, we have added the appropriate changes and updated tests in `FeatureCommandTest` and `ApiVersionsRequestTest` Reviewers: Andrew Schofield <aschofield@confluent.io>
) This PR adds withHeaders parameterization to Kafka Streams integration Tests to verify that stream operations work correctly with both DSL store format configurations (regular and headers-based format). Changes: - Added `withHeaders` boolean parameter to test methods - Tests now run with both `withHeaders=false` and `withHeaders=true` - When `withHeaders=true`, tests configure `StreamsConfig.DSL_STORE_FORMAT_CONFIG` to `StreamsConfig.DSL_STORE_FORMAT_HEADERS` - For tests that produce data in @BeforeAll, moved data production to @beforeeach to ensure fresh data for each parameterized test run - Added topic cleanup in @AfterEach for proper isolation between test runs This ensures comprehensive test coverage for foreign key joins across different store formats. Reviewers: Matthias J. Sax <matthias@confluent.io>, TengYao Chi <frankvicky@apache.org>
…21540) Description: This change fixes a bug in recursive parent-node matching where a previously found ancestor match could be overwritten by a later branch returning null. The issue occurs when traversing multiple parents. if an earlier parent path finds a matching ancestor but a later parent path does not, the previous implementation could return null instead of preserving the first valid match. This can lead to incorrect ancestor detection during topology optimization for merged streams and may prevent expected repartition-topic sharing behavior. The fix returns immediately when a recursive parent traversal finds a non-null match, and returns null only after all parent branches have been checked with no match. Testing strategy: - Added a unit test that reproduces the multi-parent traversal case and verifies the matching ancestor is preserved. - Added topology optimization tests for merged streams with key-changing operations on both left and right branches, verifying repartition-topic sharing still produces a single repartition node. Reviewers Nikita Shupletsov <nshupletsov@confluent.io> , Bill Bejeck <bbejeck@apache.org>
…(with ref to DLQ) (apache#21606) KAFKA-19863: Add documentation on how to implement a custom exception(with ref to DLQ) This PR updates the Streams documentation to include guidance on implementing a custom exception handler with built-in Dead Letter Queue (DLQ support introduced by KIP-1034. Reviewers Nikita Shuplestov <nshuplestov@confluent.io>, Bill Bejeck <bbejeck@apache.org>
…#22004) WindowHeaders-supplier must implement HeadersBytesStoreSupplier. Reviewers: Bill Bejeck <bbejeck@apache.org>, Alieh Saeedi <asaeedi@confluent.io>, TengYao Chi <frankvicky@apache.org>
…r share fetch (apache#22005) The delayed share fetch triggers waiting requests for same topic partition in purgatory for which the request has been completed. This is needed as for same sahre partition there can be requests in purgatory waiting to acquire lock. However, the triggers are always not needed i.e. when no data is available in partition hence these triggers can be avoided. Reviewers: Andrew Schofield <aschofield@confluent.io>
… background (apache#21959) https://issues.apache.org/jira/browse/KAFKA-20089 Changing the error code needs a kip. For now updating javadocs. A kip will be proposed later. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
To address GHSA-2m67-wjpj-xhg9 Reviewers: Luke Chen <showuon@gmail.com>
…ion (apache#22003) All GroupConfig fields are now Optional<T>, storing only explicitly provided values. Broker-level defaults are resolved at access time via flatMap().orElse(brokerDefault), eliminating stale-capture issues when broker configs change dynamically. Key changes: - All 21 GroupConfig fields are private Optional, using optionalInt/Boolean/String helpers based on originals(). - GroupConfigManager no longer needs a defaultConfig; constructor simplified. - GroupCoordinatorConfig.extractGroupConfigMap(ShareGroupConfig) removed. - All consumers (GroupMetadataManager, ShareGroupConfigProvider, KafkaApis) use flatMap. - validateValues refactored with validateIntRange/Max/Min helpers operating on a single filtered parsed map. - Cross-field checks use broker defaults for missing values. Reviewers: Sean Quah <squah@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
apache#21983) When a broker's IP address changes (e.g., pod replacement in Kubernetes), `TelemetrySender.stickyNode` retains a stale `Node` object with the old address. Since `canSendRequest()` checks by node ID only, the stale connection passes the check and telemetry data is sent to the wrong host. This PR refreshes `stickyNode` against current metadata at the start of `TelemetrySender.maybeUpdate()`. If the node's address has changed, `stickyNode` is updated to the fresh `Node`. If the node no longer exists in metadata, `stickyNode` is cleared and reconnect backoff is returned. Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com> --------- Signed-off-by: Daeho Kwon <trewq231@naver.com>
Upgrades Gradle from 9.2.1 to 9.4.1 and the Spotless plugin from 8.0.0
to 8.3.0.
The upgrade was previously blocked because Spotless' default
`removeUnusedImports()` relies on google-java-format, which depends on
`com.sun.tools.javac.*` internals and breaks under the newer
Gradle/Spotless worker setup. Worked around by switching to the
`cleanthat-javaparser-unnecessaryimport` engine, which uses JavaParser
and does not touch JDK internals:
```
removeUnusedImports('cleanthat-javaparser-unnecessaryimport')
```
The wrapper files (`gradle/wrapper/gradle-wrapper.properties`,
`gradlew`, `wrapper.gradle`) were regenerated following the procedure
documented in `gradle/wrapper/README.md`, using the wrapper-jar and
binary-distribution checksums published at
https://gradle.org/release-checksums/.
Verified locally:
- `./gradlew --version` reports Gradle 9.4.1
- `./gradlew build -x test` passes (spotlessCheck + checkstyle +
spotbugs + compile all green)
- `./gradlew spotlessApply --rerun-tasks --no-build-cache` ran 100 times
in a loop — 0 dirty iterations, 0 build failures, confirming the
cleanthat engine produces stable output under Gradle's parallel workers
on this codebase
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, PoAn Yang
<payang@apache.org>, Ken Huang <s7133700@gmail.com>, Dejan Stojadinović
<dejan2609@gmail.com>
…#21866) This patch adds missing Control Batch types in the documentation. Reviewers: Luke Chen <showuon@gmail.com>, Andrew Schofield <aschofield@apache.org>, Mickael Maison <mickael.maison@gmail.com> --------- Signed-off-by: Federico Valeri <fedevaleri@gmail.com>
…ache#21920) https://issues.apache.org/jira/browse/KAFKA-20363 The test had a race condition between Test thread and Queue thread. We actually wait until queue thread is sleeping/waiting so it guarantees the order. Reviewers: junvelop <151982401+junjunclub@users.noreply.github.com>, TaiJuWu <tjwu1217@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
Migrate AlterPartitionManager from Scala to Java in the `org.apache.kafka.server.partition` package, including the interface, `DefaultAlterPartitionManager` implementation, and `AlterPartitionItem` record. Also migrate `AlterPartitionManagerTest` and move `MockAlterPartitionManager` from `TestUtils.scala` to `org.apache.kafka.server.util`. Reviewers: Mickael Maison <mickael.maison@gmail.com>
…ing for Mockito upgrade (apache#21857) `testRecordPruningTaskPeriodicityWithSomeFailures` fails with Mockito 5.21+ because unstubbed `CompletableFuture` methods now return`completedFuture(null)` instead of `null`, so unstubbed `deleteRecords` calls succeed instead of being no-ops. Changed `deleteRecords` stubs from offset-based matching to partition-based matching, which better reflects the test intent (tp1 succeeds, tp2 fails) and removes dependency on Mockito's default return behavior. Upgraded Mockito from 5.20.0 to 5.23.0 as the issue description allows 5.21.0 or more recent versions. Verified locally that both 5.21.0 and 5.23.0 pass all `ShareCoordinatorServiceTest` tests. Reviewers: Andrew Schofield <aschofield@confluent.io>, Sushant Mahajan <smahajan@confluent.io>, nileshkumar3 <nileshkumar3@gmail.com>
This PR adds a possibility to configure memory limit within system
tests.
### Why?
Right now, anyone who needs to a different memory limit in CI has to do
something like:
```java
sed -i
's/docker_run_memory_limit="2000m"/docker_run_memory_limit="4000m"/'
tests/docker/ducker-ak
```
with is not so UX friendly and it can break anytime if anyone changes
the content of the `ducker-ak`. With such change we can easily do:
```bash
./tests/docker/ducker-ak up --memory 4000m
```
or even
```bash
DUCKER_RUN_MEMORY=4000m ./tests/docker/run_tests.sh
```
So there is no need for using sed etc.
> [!NOTE] > I have designed this that CLI will always have bigger
priority than ENV so we are consistent across whole Kafka repo (i.e., if
I express this in relation ... CLI > ENV > default option)
### Tested
1. ./tests/docker/ducker-ak up --help
2. DUCKER_RUN_MEMORY=4000m ./tests/docker/ducker-ak up -n 2 --force
3. ./tests/docker/ducker-ak up -n 2 --force --memory 8000m
4. DUCKER_RUN_MEMORY=4000m ./tests/docker/ducker-ak up -n 2 --force
--memory 8000m
5. DUCKER_RUN_MEMORY=4000m ./tests/docker/run_tests.sh
6. ./tests/docker/run_tests.sh (points to default)
7. ./tests/docker/ducker-ak up (points to default)
and then also verified (in each) :
```bash
podman inspect ducker01 --format '{{.HostConfig.Memory}}'
```
Everything looks good and without issues.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
…ng revoked (apache#22015) The `Assignment` field in the `ConsumerGroupDescribe` response only included `assignedPartitions`, omitting `partitionsPendingRevocation`. This caused partitions to disappear from the response during reconciliation. The fix merges both into the `Assignment` field since the member is still responsible for those partitions until revocation completes. Reviewers: Sean Quah <squah@confluent.io>, Lianet Magrans <lmagrans@confluent.io>
…sts (apache#21998) Adds `SelfManagedOffsetRecoveryIntegrationTest` and `RocksDBStoreTestingUtils` Tests verify Kafka Streams recovers correctly after unclean shutdowns when offsets are stored in RocksDB column families (KIP-1035). Covers: - Unclean shutdown recovery (ALOS and EOS) - Missing offsets triggering task corruption detection - Combined status=open + missing offsets - EOS exactly-once guarantees preserved after recovery - Multi-store partial corruption - Standby task recovery - KAFKA-19712 regression (standby TaskCorruptedException after state wipe) Reviewers: Matthias J. Sax <matthias@confluent.io>
…#21793) * PR adds code in `SharePartition` to support DLQ of records (batch and per offset) which have been REJECT acknowledged by the client. * Appropriate unit tests have been added where applicable. * The dynamic `ShareVersion` feature has been used to detect DLQ support. * Current impl hardcodes `NoOpShareGroupDLQManager`. This will be updated to specific implementation in future PRs. Reviewers: Andrew Schofield <aschofield@confluent.io>
…he#22022) Replace non-breaking spaces with regular spaces. Replace Record with ProducerRecord to make the code compilable. Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>
…he#22287) Document when the flag applies, that existing ACLs are unchanged, and that production clusters should prefer explicit ACLs with the default. Reviewers: Mickael Maison <mickael.maison@gmail.com>, Sushant Mahajan <smahajan@confluent.io> Co-authored-by: Hana1025 <huikewong@gmail.com>
Clarifies the log compaction docs by distinguishing between the log cleaner being enabled and compaction being applied only to topics whose cleanup policy includes `compact`. It also calls out the topic-level `cleanup.policy=compact` override separately from the broker-level `log.cleanup.policy=compact` default. Reviewers: Mickael Maison <mickael.maison@gmail.com>, Sushant Mahajan <smahajan@confluent.io>
…he#22329) Ref : https://issues.apache.org/jira/browse/KAFKA-18810 Marked the producer-throttle assertion as "wait until it's true" instead of checking immediately Reviewers: nileshkumar3 <nileshkumar3@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
…-defined (apache#22302) Previously, `internal group configs` with a `broker-level synonym` were always included in the group config map, exposing them even when the user had never explicitly set them. This change skips such configs unless the user has configured them either via the broker synonym or directly at the group level. Reviewers: Sean Quah <squah@confluent.io>
…LogDirTest) (apache#22306) Migrate `AlterLogDirTest` to the new test infrastructure Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, PoAn Yang <payang@apache.org>, TaiJuWu <tjwu1217@gmail.com>
) - Moves AutoTopicCreationManager and AutoTopicCreationManagerTest from core to server to align with ownership. - Moves buildEnvelopeRequest to ForwardingManagerUtils.java - Removes the unused `controllerMutationQuota` parameter from AutoTopicCreationManager#createTopics. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Lucas Brutschy <lbrutschy@confluent.io>
…pache#22308) The `stateManager` field is `final` and assigned in both constructors of `DefaultStatePersister`, so the `if (stateManager != null)` guard in `stop()` can never evaluate to false. No behavior change; existing tests in `DefaultStatePersisterTest` remain sufficient. Reviewers: Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
…on in group coordinator (apache#22264) The `deserialize*` methods in `ConsumerProtocol` only catch `BufferUnderflowException` and re-wrap it as `SchemaException`. Malformed bytes can also surface as other `RuntimeException`s (e.g. `IllegalArgumentException` from negative array lengths in `ByteBufferAccessor`), which escape these methods. Callers that only guard against `SchemaException` then propagate the failure, which can destabilize the group coordinator. ### Changes In `ConsumerProtocol`, broaden the `catch` clause from `BufferUnderflowException` to `RuntimeException` in all four deserialization entry points, re-wrapping as `SchemaException`: - `deserializeSubscription` - `deserializeConsumerProtocolSubscription` - `deserializeAssignment` - `deserializeConsumerProtocolAssignment` Reviewers: Sean Quah <squah@confluent.io>, David Jacot <david.jacot@gmail.com>
…roup protocols (apache#22331) In TransactionsTest, there are several duplicate tests which are not really required to run for CONSUMER group protocol, as they are using consumer to subscribe, assign and poll which is protocol agnostic path Reviewers: PoAn Yang <payang@apache.org>, Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
…P-1241) (apache#20913) [JIRA:19893](https://issues.apache.org/jira/browse/KAFKA-19893) [KIP:1241](https://cwiki.apache.org/confluence/x/A4LMFw) Currently, Kafka uploads all non-active local log segments to remote storage even when they are still within the local retention period, resulting in redundant storage of the same data in both tiers. <img width="1503" height="772" alt="image" src="https://github.com/user-attachments/assets/55e95e2e-4ab0-4ab9-b28b-871760f331fa" /> This wastes storage capacity (cost) without providing immediate benefits,since reads during the retention window prioritize local data. However, some users/topics do real-time analytics based on remote storage directly and need the latest data to be available as soon as possible (In fact, it only tries to stay as up-to-date as possible, but it still can’t include the latest data because the active segment hasn’t been uploaded yet.). Therefore, this optimization is offered as a **topic's optional configuration** rather than the default behavior. Here are some additional thoughts/considerations. 1. Local files won’t be deleted until they’ve been uploaded to the remote storage, so this change is very safe—you don’t need to worry about files being cleaned up before they be upload to the remote. 2. Considering the latency of remote storage, the local retention period won’t be set too short. For example, in our production environment, we keep 1 day of local data alongside 3-7 days in remote storage, so there’s still 1 day of redundancy. Example for the goal: <img width="797" height="520" alt="image" src="https://github.com/user-attachments/assets/be6725f1-02e7-4b09-aea9-7ce3bbb5e227" /> Reviewers: Kamal Chandraprakash <kamal.chandraprakash@gmail.com> --------- Signed-off-by: stroller <fujian1115@gmail.com> Signed-off-by: Jian <fujian1115@gmail.com> Signed-off-by: stroller.fu <stroller.fu@zoom.us> Co-authored-by: stroller.fu <stroller.fu@zoom.us>
…Test to new test infra (apache#22343) Migrate `ReassignReplicaMoveTest` and `ReassignReplicaExpandTest` to new test infra, and integrated into a single test file. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
This is a follow-up PR for apache#20913 (comment) Thanks. cc @kamalcph Reviewers: Murali Basani <muralidhar.basani@aiven.io>, Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
props -> props3 Reviewers: Chia-Ping Tsai <chia7712@gmail.com> Signed-off-by: PoAn Yang <payang@apache.org>
…d removeReconfigurable in AbstractKafkaConfig (apache#22342) Ref: apache#22302 (comment). Convert `addReconfigurable` and `removeReconfigurable` from abstract to default no-op methods so tests no longer need to supply dummy overrides. Reviewers: Ken Huang <s7133700@gmail.com>, Murali Basani <muralidhar.basani@aiven.io>, Chia-Ping Tsai <chia7712@gmail.com>
…e#22361) Migrate DeleteSegmentsTest to new test infra Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
…he#22360) Sometimes the system test `PerformanceServiceTest` could fail with 0 records consumed by share consumer because the perf test command times out in 10 seconds (default) and the partition assignment might not have happened for the share member. This is because assignment does not happen until share partitions are initialized in the share coordinator which is dynamically loaded when the FindCoordinator request is sent. The share group state topic too is created by auto topic manager and is not present in a cold start cluster. Decreasing the share goup heartbeat interval solves the issue as more heartbeats get sent in the 10 second interval. Default heartbeat interval is 5 seconds so test could time out before getting an assignment. Reviewers: Andrew Schofield <aschofield@confluent.io>
5.0.0 was removed by apache/infrastructure-actions@9ef334e Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Migrate DeleteTopicTest to new test infra. Reviewers: Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
Migrate `ListOffsetsTest` to the new test infrastructure Reviewers: Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
…n from task.prepareCommit (apache#22282) 1. Fix TaskManager.handleRevocation to always suspend revoked tasks, even when prepareCommit throws (e.g. TaskMigratedException from producer.send during cache flush). Previously the exception propagated uncaught, skipping the suspend loop entirely. This left tasks in RUNNING state, which caused a downstream IllegalStateException when handleAssignment tried to close them. 2. Wrap prepare/commit/postCommit in try-finally so the suspend loop and task unlock are guaranteed to execute regardless of where an exception occurs. 3. Preserve all exceptions via addSuppressed instead of silently dropping later exceptions. The first exception remains the primary thrown exception for backward compatibility, but subsequent exceptions (e.g. the IllegalStateException from closing an unsuspended task) are now attached as suppressed exceptions instead of lost. 4. Updated all callers in shutdown(), tryCloseCleanActiveTasks(), and handleRevocation to use maybeSetFirstException consistently. This is a behavior change: secondary failures during shutdown and task close are now visible as suppressed exceptions instead of being lost. Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
Updates docs for KIP-1271 for the new header state-stores. Reviewers: Alieh Saeedi <asaeedi@confluent.io>, Matthias J. Sax <matthias@confluent.io>
Adding 4.3.0 to the system tests as per the post release instructions: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=34840886#ReleaseProcess-Afterrelease I'll follow up with 2 other PRs for the core and streams system tests updates. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
It seems we did not add 4.2.0 when it was released, so fixed that too. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Remove outdated comment Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
…response handling (KIP-1319) (apache#22265) `TxnOffsetCommitHandler.handleResponse` in `TransactionManager` folded the response into a `Map<TopicPartition, Errors>` via `TxnOffsetCommitResponse.errors()` before processing it. This loses the response's topic-level structure, which v6+ (KIP-1319) needs in order to resolve topic IDs back to names. This patch switches the handler to iterate directly over `response.data().topics()` and `responseTopic.partitions()`, so the per-topic structure is preserved. The error-handling switch ladder is unchanged; only the iteration shape is different. The now-unused `TxnOffsetCommitResponse.errors()` accessor is removed along with its two test usages. This is a pure refactor: at v0-5, the response topic always carries the topic name, so `new TopicPartition(responseTopic.name(), responsePartition.partitionIndex())` reproduces what `errors()` would have returned. The v6+ wiring that resolves topic IDs back to names will be added in a follow-up patch. Reviewers: Lianet Magrans <lmagrans@confluent.io>
…50 (apache#22258) Updates docs/operations/monitoring.md to include the `num-keys` metric for in-memory state stores introduced by KIP-1250 KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-1250%3A+Add+metric+to+track+size+of+in-memory+state+stores Reviewers: Bill Bejeck <bbejeck@apache.org>
…fra (apache#22377) Migrate DisableRemoteLogOnTopicTest and EnableRemoteLogOnTopicTest to new test infra. Reviewers: Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
…hangs (apache#18142) The call to `backingStore.get()` (called by connector task threads through `OffsetStorageReaderImpl.offsets()`) can block for long time waiting for data flush to complete (`KafkaProducer.flush()`). This change moves that call outside the synchronized clause that holds `offsetReadFutures`, so that if `backingStore.get()` hangs then it does not keep `offsetReadFutures` locked. The access to `closed` flag (`closed.get()`) is kept inside the synchronize clause to avoid race condition with `close()`. This is important because `OffsetStorageReaderImpl.close()` needs to lock `offsetReadFutures` as well in order to cancel the futures. Since the herder thread calls `OffsetStorageReaderImpl.close()` when attempting to stops a task, before this change this was resulting in the herder thread hanging indefinetely waiting for `backingStore.get()` to complete. Reviewers: Greg Harris <greg.harris@aiven.io>, Viktor Somogyi-Vass <viktorsomogyi@gmail.com>
…eSegmentTest to new test infra (apache#22376) Migrate `ReassignReplicaShrinkTest` and `RollAndOffloadActiveSegmentTest` to the new test infrastructure. Reviewers: Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
…ache#20707) Previously, bootstrap metadata was stored in a separate bootstrap.checkpoint file, while the zero checkpoint contained only KRaft control records. This change unifies them by having the Formatter append bootstrap metadata records into the zero checkpoint alongside the existing KRaft control records, integrating with KRaft's bootstrapping checkpoint mechanisms like RaftClient.Listener#handleLoadBootstrap and KIP-630 snapshot lifecycle management. QuorumController's handleLoadBootstrap now extracts bootstrap records from the zero checkpoint and stores them as BootstrapMetadata, which is later committed by ActivationRecordsGenerator when the controller activates on an empty metadata log. The BootstrapDirectory class is removed and its functionality consolidated into static methods on BootstrapMetadata#fromDirectory reads from the legacy bootstrap.checkpoint (falling back to defaults), and fromCheckpointFile reads from a specific checkpoint path. StorageTool now only writes the bootstrap snapshot when the node has the Controller role. KafkaClusterTestKit is updated to pass non-feature versions, non-SCRAM bootstrap records to the Formatter as additional bootstrap records. Reviewers: José Armando García Sancio <jsancio@apache.org>, Kevin Wu <kevin.wu2412@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Replicated from Apache Kafka PR apache#20707
Original PR: apache#20707
This PR is part of a demo/workshop setup showcasing AI SRE capabilities.